(Top 500 Indian Cities).

Table of Contents

Dataset story

domain situation

The message is an indian charity aims to identify the least and most indian cities in population and compare its male vs female graduates, literates and children.Which leads to prioritize the cities needs to develop the teaching process as building schools and providing it with all its needs and technologies.The organization then analyze the results of the visualization comparisons shown below which helps in decision making process to identify which city needs to be developed at first

what

we will use the dataset which type is table

attributes : •'name_of_city’: Name of the City (Categorical attribute and it takes string values,493 Levels)

• 'state_code’: State Code of the City (numeric attribute and it takes integer values, range from 1 to 35)

• 'state_name’: State Name of the City (Categorical attribute and it takes string values, 29 Levels)

• 'dist_code’: District Code where the city belongs (numeric attribute and it takes integer values ) (numeric, range from 1 to 99)

• 'population_total’: Total Population (numeric attribute and it takes integer values, range from 100036 to 12478447)

• 'population_male’: Male Population (numeric attribute and it takes integer values attribute and it takes integer values, range from 50201 to 6736815)

• 'population_female’: Female Population (numeric attribute and it takes integer values attribute and it takes integer values, range from 45126 to 5741632)

• '0-6_population_total’: 0-6 Age Total Population (numeric attribute and it takes integer values attribute and it takes integer values, range from 6547 to 1209275)

• '0-6_population_male': 0-6 Age Male Population (numeric attribute and it takes integer values attribute and it takes integer values, range from 3406 to 647938)

• '0-6_population_female': 0-6 Age Female Population (numeric attribute and it takes integer values attribute and it takes integer values, range from 3107 to 561337)

• 'literates_total’: Total Literates (numeric, attribute and it takes integer values attribute and it takes integer values range from 56998 to 10237586)

• 'literates_male’: Male Literates (numeric, attribute and it takes integer values attribute and it takes integer values range from 34751 to 5727774)

• 'literates_female’: Female Literates (numeric attribute and it takes integer values attribute and it takes integer values, range from 22247 to 4509812)

• 'sex_ratio’: Sex Ratio (numeric, attribute and it takes integer values range from 700 to 1093)

• 'child_sex_ratio’: Sex ratio in 0-6 (numeric attribute and it takes integer values, range from 762 to 1185)

• 'effective_literacy_rate_total’: Literacy rate over Age 7 (numeric attribute and it takes float values, range from 49.51 to 98.8)

• 'effective_literacy_rate_male’: Male Literacy rate over Age 7 (numeric attribute and it takes float values, range from 52.27 to 99.3)

• 'effective_literacy_rate_female': Female Literacy rate over Age 7 (numeric attribute and it takes float values, range from 46.45 to 98.31)

• 'location’: Lat, Lng, the Location (Categorical attribute and it takes string values, 490 Levels)

• 'total_graduates’: Total Number of Graduates (numeric and it takes integer values, range from 2532 to 2221137)

• 'male_graduates’: Male Graduates (numeric and it takes integer values, range from 1703 to 1210040)

• 'female_graduates’: Female Graduates (numeric and it takes integer values, range from 829 to 1011097)

items : each row represent item

why

1)Distribution plots of literacy rates (Total, male & female)? salma

2)Cities with highest sex ratio (top 20) on map?

3)relations between some columns?

4)how to find repeated values in each column?

5)Comparison between number of cities in each state(static)? ayya

6)Is there any relation between sex ratio and Literacy rates? using Scatterplot?

7)what is the percentage of each state_name in the first 30 row in the data ?

8)Cities with highest literacy rates (top 20) ?

9)Graduates’ distribution in top 20 cities total (all ordered by the greatest number of graduates)? Using barplot? rahma

10)Graduates’ distribution in top 20 cities for male & female (all ordered by the greatest number of graduates)? Using barplot

11)Graduates’ distribution in top 20 cities for total &male & female (all ordered by the greatest number of graduates)? Using barplot

12)Top 10 states with highest population? using barplot

13)Is there any relationship between columns?

14)Male VS Female Graduates in each city (static) ahmed

15)Comparison between Total population,Total population younger than 6,Total Graduates and Total literates in each city (static)

16)Cities with highest sex ratio (top 20)

17)

we will talk about what plots we use and why below

How

Data Wrangling

What is the structure of your dataset?¶

data has 493 rows and 22 columns about Top 500 Indian Cities

What is/are the main feature(s) of interest in your dataset?

The main features of my goal in the data are to find all information about Top 500 Indian Cities

What features in the dataset do you think will help support your investigation into your feature(s) of interest?

name_of_city ,state_code ,state_name ,dist_code,population_total ,population_male ,population_female ,0-6_population_total,0-6_population_male,0-6_population_female ,literates_total ,literates_male ,literates_female ,sex_ratio ,child_sex_ratio ,effective_literacy_rate_total ,effective_literacy_rate_male ,effective_literacy_rate_female ,location ,total_graduates ,male_graduates ,female_graduates

load dataset

Data Cleaning

We want to split values in location column to latitude and longitude with 2 coulmns called(lat,lon) to use it in maps.

Then we want to change the data type of lat and lon columns from object to numeric because we can't use object data type in maps

Mahmoud Wael

What is the number of cities in each state?(static)

we use barplot because the barplot is used to display the relationship between a numeric and a categorical variable

state UTTAR PRADESH has the hieghest number of cities

Is there any relation between sex ratio and female literacy?

There is a linear relationship between sex ratio and female literacy shown in Fig

Is there any relation between sex ratio and male literacy?

There is a linear relationship between sex ratio and male literacy shown in Fig

Top 20 cities(population-wise)? on map

Cities with highest literacy rates (top 20) on map?

Cities with highest sex ratio (top 20) on map?

Salma Mahmoud

relations between some columns

we will use SPOM because we want to get different relations between variable ("population_total", "0-6_population_total", "effective_literacy_rate_total", "total_graduates") and set colors using a column of the dataframe called name_of_city

we get relation between variable ("population_total", "0-6_population_total", "effective_literacy_rate_total", "total_graduates")

how to find repeated values in each column

in this problem we use histogram for each column because histogram is the most commonly used graph to show frequency distributions

finally we found number of frequant values in each column

Distribution plots of literacy rates (Total, male & female)?

we found that males have literacy rate is more than females

male vs Female population in top 20 cities?

Rahma Atef

Graduates’ distribution in top 20 cities total (all ordered by the greatest number of graduates)? Using barplot¶

Graduates’ distribution in top 20 cities for male & female (all ordered by the greatest number of graduates)? Using barplot¶

Graduates’ distribution in top 20 cities for total &male & female (all ordered by the greatest number of graduates)? Using barplot

Is there any relationship between columns?

we use heatmap because it is a two-dimensional visual representation of data and we can see relation between columns better

there is a relation between columns

Top 10 states with highest population? using barplot

in this problem we use the barplot is used to display the relationship

between a numeric and a categorical variable in this case we want to find relation between state_name (catigorical) and population_total (numeric)

Ayya Abdelaziz

Is there any relation between sex ratio and Literacy rates? using Scatterplot

Is there any relation between sex ratio and Literacy rates female? using pie chart

we found top 10 states have population and MAHARASHIRA state has the largest population and RAJASTHAN state has the lowest population

Cities with highest literacy rates (top 20) ?

Cities with highest sex ratio (top 20)

Ahmed Gamal

Male VS Female Graduates in each city (static)

Comparison between number of cities in each state(static)

Comparison between Total population,Total population younger than 6,Total Graduates and Total literates in each city (static)